c# and reading file contents

Soldato
Joined
18 Oct 2002
Posts
4,925
Location
Yorkshire
Here's a little background as to what i'm trying to do

i've got a html file which has got sections in out identified by div's like so

Code:
<HTML>
   <HEAD><TITLE>This Page</TITLE></HEAD>
   <BODY>
       <Div id="Section1">
            lar lar lar lar lar
       </div>
       <Div id="Section2">
            lar lar lar lar lar
       </div>
       <Div id="Section3">
            lar lar lar lar lar
       </div>
       <Div id="Section4">
            lar lar lar lar lar
       </div>
   </BODY>
</HTML>

This html file is called something like test.htm

Now in C# I'm trying to read the contents of this html file but exclude certain sections depending on that variables i've got.
i.e. in the C# code I might have an array of boolean like:

Section1 = true
Section2 = false
Section3 = true
Section4 = false

I need to ready the contents of the html file but in the variable that holds the read data I need the sections that are marked false removed.

Any ideas on this ?
 
Finally got round to giving this a go and had to try and do it the following way as the XmlTextReader didn't like the file contents.

This is the code but for some reason its not removing the divs that I don't need.

What happens is the printJobFileName contains a querystring such as Section_0, Section_1 etc and in the html file this has got divs with the same ID's . This come should loop through and remove the divs where there set as false in the querystring.

Any ideas ?

Code:
private byte [] extraxtContents(byte[] data, string printJobFileName)
    {

     

       
        


        Uri uri = new Uri(printJobFileName);

        string html = System.Text.ASCIIEncoding.ASCII.GetString(data);

        bool finished = false;
        int startIndex = html.IndexOf( "<div" );
        sEvent = startIndex.ToString();
        int endIndex;
        // split the query string on the '?' character
        string [] queryValues = uri.Query.Split( '&' );
        // iterate thru all div tags and check id attr against query string values
        while( startIndex > 0 )
        {
            // find end of opening div tag
            endIndex = html.IndexOf(">", startIndex);
            if (endIndex > 0)
            {
                // iterate thru all query string values
                foreach (string query in queryValues)
                {
                    if (query.Split('=').Length < 2)
                        continue;
                    string sectionName = query.Split('=')[0];
                    bool enabled = bool.Parse( query.Split('=')[1] );

                    if (!enabled)   // is this section disabled?
                    {
                        // does this div's id match the query string value
                        int idIndex = html.IndexOf(sectionName, startIndex);
                        if (idIndex < endIndex) // yes
                        {
                            // remove this div tag from HTML
                            endIndex = html.IndexOf("</div>", startIndex);
                            if (endIndex > 0)
                            {
                                string divTag = html.Substring(startIndex, endIndex + 5);
                                html.Replace(divTag, "");
                                break;
                            }
                        }
                    }
                }                            
                // find next div tag
                startIndex = html.IndexOf("<div", startIndex+1 );
            }
        }

 

        return ASCIIEncoding.ASCII.GetBytes(html);
    }
 
little update, it appears that its the line
Uri uri = new Uri(printJobFileName);
which is the problem because the URI function is converting the '?' character in the URL to '%3F' which is screwing things up when the URI is trying to separate the query string.

Anyone had this problem before and know how to sort it as a google is throwing up nothing
 
you need to use something like...

int searchSoft = -1;
string str = "'%3F"

searchSoft = SoftName.IndexOf(str, StringComparison.OrdinalIgnoreCase)

if(searchSoft >=0)
{
SoftName.Replace("'%3F","?");
break;
}

do you get where I'm coming from?

Stelly
 
in the end I just scraped the use of URI and just split the whole URL string on '?' and took the 2nd element in the array.

There was also a few other problems such as the replace not being assigned back into the html variable, also the substring method was wrong as I was using the endindex as a seccond parameter when it needs to be the number of characters from the startindex.

And finally had to add some more validation to ' if (idIndex < endIndex)' as if the Div was removed and the id was not found then a -1 was returned so it eneded up removing everything after this.

Thanks for the help guys.

Also will there be any isssues with using the split on the '?' insted of using URI ?????
 
Back
Top Bottom